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Abstract. This introductory presentation describes the Overlap Dirac Operator, 
O ' why it could be useful in numerical QCD, and how it can be implemented. 

o 

^ I 1 Introduction 

^>. ' The objective of this talk is to introduce a certain matrix, the Overlap Dirac 

f^ I Operator. The ideas behind this construction are rather deep, having to do 

■^ ' with the self-consistency of chiral gauge theories |^ . Chiral gauge theories are 

^^ I a class of quantum field theoretical models from which one draws the minimal 

standard model of particle physics. This model describes all phenomena at 
distances larger than about 10~^* meters, excluding gravity. 
^\ • The Overlap Dirac Operator may be useful also in numerical studies of 

the strong interaction component of the minimal standard model. Quantum 
Chromo-Dynamics (QCD). QCD, in isolation, is a vector-like gauge theory, a 
simple combination of chiral gauge theories within which some of the mathe- 
matical complexities of the general class disappear. Nevertheless, the strong 
^ , nonlinearity of QCD makes the quantitative evaluation of its most basic ob- 

servable features possible only within large scale numerical Monte Carlo sim- 
ulations. This workshop will deal mainly with numerical QCD, the subfield 
of lattice field theory which focuses on such simulations. 

Therefore, the form of the Overlap Dirac Operator will be motivated 
hcuristically, based on technical considerations of numerical QCD. The heuris- 
tic motivation will start from the continuum Euclidean Dirac operator, whose 
basic features of direct relevance to numerics will be reviewed. As alluded 
above, this is not the original way the Overlap Dirac Operator was found, 
but, for the focus of this workshop it is unnecessary to go through the entire 
construction. Rather, I shall start by reviewing an important technicality, 
the light quark bottleneck, which is a serious problem in contemporary tra- 
ditional numerical QCD, argue that there is no fundamental reason why 
this bottleneck cannot be avoided and present the Overlap Dirac Operator 
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as a potential solutionR The solution is not yet working efficiently because 
another bottleneck appears. However, the new bottleneck may be easier to 
circumvent than the old one. 

An effort has been made to make this presentation accessible to applied 
mathematicians. Physicists practicing numerical QCD, in particular when the 
Overlap Dirac Operator is used, are facing issues that the expertise of spe- 
cialists in numerical linear algebra could be of great help in resolving. These 
same physicists will likely be familiar with large portions of the presentation 
below, perhaps excepting the way the Overlap Dirac Operator is arrived at. 



2 The numerical bottleneck of light quarks 

The QCD Lagrangian with correct parameters should produce the observed 
ratio of masses ^^^ = 0.18. The relative smallness of this number reflects the 

nip 

relative small mass of two of the lightest quarks in Nature. This mass is not 
only small relatively to other quark masses, but, more importantly, relative to 
a basic scale of QCD itself, a mass parameter known as Aqcd- Theoretically, 
one has very good reasons to expect to be able to smoothly extrapolate in 
the light quark masses from, say, ^^^ = 0.25 to the smaller physical value. 
Actually, even the functional form of these extrapolations is known. However, 
it is not likely that the extrapolation will go smoothly through the value 
^^^^ = 0.5. As long as ^-^^ > 0.5 the decay p ^ 27r is energetically forbidden 
but once this threshold is passed the decay becomes allowed. This provides 
physical reasons to expect a "bumpy" behaviorg in the extrapolation to light 
quark masses in the region where ^^^^ ^0.5 and there are no first principle 
based theoretical expressions parameterizing this "bumpy" region. Current 
large scale numerical work happens to be close to the threshold ^^^ = 0.5, 
and there are numerical difficulties obstructing straightforward attempts to 
go significantly lower. 



Other solutions than the Overlap Dirac Operator have been proposed for this 
specific problem; using the Overlap Dirac Operator is probably the most funda- 
mentally motivated approach, but the cost may be high, and at the bottom line 
this consideration often takes precedence. 

Some quantities may be insensitive to the p — > 27r threshold effects, typically 
for kinematic reasons. Also, there is an additional approximation which is very 
often employed (the so called "valence", or "quenched" approximation) which 
eliminates decay effects on the vacuum of the theory. On the other hand, one often 
relies on an effective low energy description of QCD, in which the Lagrangian 
is replaced by an "effective Lagrangian" involving only tt's and some other light 
associated particles. The existence of a particle like the p and its connection to 
the 27r state is then encoded in the coefficients of the "effective Lagrangian" and 
will affect the accuracy of chiral perturbation theory at moderate energies. Thus, 
physical observables related to "weak matrix elements" are an example where 
one might suspect sensitivity to threshold effects. 



In numerical work one uses a lattice whose spacing a is measured in 
inverse mass units. Thus any number of the type mass x a is a pure number, 
of the kind a computer can handle. The threshold ratio is obtained, with 
current simulation parameters, for rriT^a ^ 0.1 and for quark mass rrig given by 
TTigfl ~ 0.05. Nothing is lost by setting a = 1; dimensional considerations can 
always restore the right power of a. Physical considerations imply m^ ex y^m^ 
for small niq. On the other hand, rup stays finite (basically of the order of 
Aqcd) at TTiq — and consequently its numerical dependence on niq for small 
ruq is weaker. So, in order to reduce ^^^^ by a factor of 2 as we would like, 
we have to reduce the light quark masses by about a factor of 4. The bottom 
line is that we would like to leave all other simulation parameters the same, 
only change the quark mass to something like rriqa = 0.01. 

Simple arguments indicate that we should be able to do that. The quark 
mass enters as an important parameter in the lattice Dirac Operator, D. 
D and D'l'D are matrices that need to be inverted on vectors in the most 
inner procedure of the simulations. In traditional simulations Z? is a sparse 
matrix; full storage is out of the question, as the dimension of D is too large. 
The quark mass directly affects the condition number of D, so its value can 
easily become the bottleneck of the entire simulation. But, we could easily 
imagine being lucky enough to manage the needed value of ruqa = 0.01: D is a 
discretized first order partial differential operator and, since space time is four 
dimensional one can easily think of D obeying \\D\\ < 4 as a uniform upper 
bound. On the other hand, mq enters additively, with unit coefficient and, 
based on the continuum formula one would expect that D is approximately 
anti-hermitian at zero quark mass. Therefore, we expect HD^Z?!! > m^ and, 
when we invert D^ D, a condition number of the order 10^, which should allow 
the evaluation of 7570 V' in something like 500-1000 iterations. 

Up to relatively short lived jerks, computational power is projected to 
increase by Moore's Law for another decade, so we can expect an enhance- 
ment by a factor of 250 at fixed power dissipation by 2010. Based on the 
accumulated numerical experience to date, we can reasonably expect that, 
by then, we would be able to carry out our computations with relative ease. 

However, the sad fact is that theoretical issues have apparently made it 
impossible to find a simple enough D which would obey the rather plausible 
lower bound ||D^D|| > m^. This was a key assumption I made before in order 
to come up with the condition number of order 10^. What really happens in 
the traditional approach^] is that D^ + D is not small and to get ^^^ ^ .3 

one is forced into a regime where the lowest eigenvalue of D^D quite often 
fluctuates as low as 10~^. This makes the condition number go to 10^*^ and 
puts the ^^^^ ^ .3 mass ratio out of numerical reach. 

The above numerical problem reflects a difficulty of principle associated 
with the rriq = 0, or massless, case. In the continuum model, rUq = is 



^ To be specific, the approach employing Wilson fermions; there also exists a tra- 
ditional "staggered fermion" alternative, but it suffers from other problems. 



associated with an extra symmetry, chirality. Technically, chirality ensures 
that in the continuum the massive Dirac operator D + mriq (where I reserved 
D for the massless case) indeed obeys ||I?^-D|| > m^. But, chirality cannot be 
preserved on the lattice in the same way as some other symmetries can. For 
twenty years it was thought that an acceptable lattice version of D which 
also preserves chirality does not exist. The good news is that a series of 
developments, started in 92 by Kaplan and by Frolov and Slavnov pi and 
built on by Narayanan and myself, have produced, in 97, the Overlap Dirac 
Operator Do on the lattice, which, effectively has the extra symmetry. While 
Do is easy to write down, it is not easy to implement because it no longer 
is sparse. There nevertheless is some hope, because, in the most common 
implementation of the Overlap ideas. Do is what probably would be the 
next best thing: up to trivially implemented factors, it is a function (as an 
operator) of a sparse matrixQ Unfortunately, the function has a discontinuity, 
so its implementation is costly. The numerical developments in this subfield 
are relatively fresh and substantial progress has been achieved but we are not 
yet adequately tooled for full "production runs" . Still, since the time we have 
had to address the problem is of the order of a year or two I think further 
substantial progress is likely. 

3 Dirac Operator basics 

The Dirac Operator is a very important object in relativistic Field Theory. 
Among other things, it predicted the existence of the electron anti-particle, 
the positron. All of known matter is governed by the Dirac equation. 

3.1 Continuum 

The Dirac Operator is defined in the continuum (Euclidean space) by 

^c = ^7p(aM + »^p) , (1) 

/J 

where, 

7^ = Ta^; a* = 1, 2, 3, 4; {7^, 7^} = 7^7^ + 7^7,, = 25^^ . (2) 

We also have, 

A^ = A^^; trAfj^ = 0; A^ are 3 by 3 matrices. (3) 

The notation d^ indicates d/dxf^. A^ is z-dependent, but the 7^ are constant 
four by four matrices and operate in a separate space. Thus, Dc is a twelve 

* The additional "trivial" factors have a highly non-trivial numerical impact: be- 
cause of them the inverse of Do no longer has a simple expression in terms of a 
function of a sparse matrix. 



by twelve matrix whose entries are first order partial differential operators 
on a four dimensional Euclidean space, henceforth assumed to be a flat four 
torus. The massive Dirac Operator is rric + D^- 
The main properties of the Dirac Operator are: 

1. For A^ — one can set by Fourier transform d^ = ip^ giving Dc = i^^p^- 
This produces D^ = ~J2ij.Pu from the algebra of the 7-matrices. The 
important consequence this has is that Dc = iff p^ = for all /i. 

2. Define 75 = 71727374 which implies 7I = 1 and 757^ + 7;j75 = 0. The 
property of 75-hermiticity is that Dc is hermitian under an indefinite 
inner product defined by 75, or, equivalently, using the standard L2 inner 
product, we have for any backgrounds A^ and any mass parameter TOc 

-f5{Dc + mc)j5 -- {Dc + mc)'' . (4) 

Thus, 75-hermiticity is a property of the massive Dirac Operator. 

3. At zero mass the Dirac Operator is anti-hermitian Dl = —Dc This prop- 
erty, in conjunction with 75-hermiticity implies the symmetry property of 
chirality: j^Dcjs — ~Dc usually written as {Dc, 75} — 0. Traditionally, 
anti-hermiticity would be viewed as a separate and trivial property and 
chirality as a symmetry. Then, 75 hermiticity would be a consequence 
holding not only for the massless Dirac Operator, but also for non-zero 
mass. But, for better comparison to the lattice situation it is somewhat 
better to exchange cause and consequence, as done here. 



3.2 Lattice 

Numerical QCD spends most of the machine cycles inverting a lattice version 
of the massive Dirac Operator. This is how the continuum Dirac operator is 
discretized: 

The continuum x is replaced by a discrete lattice site x on a torus. The 
matrix valued functions A^ are replaced by unitary matrices Ufj,{x) of unit 
determinant, [/^(x) is associated with the link I going from x into the positive 
/x-direction, ft. 

The connection to the continuum is as follows: Assume that the functionsn 
A^(x) are given. Then, 

U — limAT^oo |"e''''^^''^^)/^e""^*'(^+'''/^e*°'^^''^^+^°)/^ . . . e*a^M(^+(^~i)a))/^l 
= Pexp[i J^dXf^Afj^{x)] (the symbol P denotes "path ordering") .(5) 

There are two sets of basic unitaries acting on twelve component vectors 

V'(x): 

^ Actually, the Afj,{x) aren't really functions, rather they make up a one form 
y^ A^{x)dxij, which is a connection on a possibly nontrivial bundle with struc- 
ture group SU{3) over the four-torus. This complication is important for the case 
of massless quarks, but shall be mostly ignored in the following. 



1. The point wise acting 7^'s and 

2. the directional parallel transporters T^, defined by: 

r^(V')(x) = C/^(x)^(x + /i) . (6) 

There also is a third class of unitaries carrying out gauge transformations. 
Gauge transformations are characterized by a collection oi g{x) e SU{3) and 
act on -0 pointwise. The action is represented by the unitary G{g), so that 
{G{g)ip){x) — g{x)'ijj{x). Probably the most important property of the T^ 
operators is that they are "gauge covariant" , 

G{g)T^{U)G\g)^T^{Un , (7) 

where, 

U^^ix) = g{x)U^{x)g\x + ft) . (8) 

The variables Ufi{x) are stochastic, distributed according to a probability 
density that is invariant under U —^ U^ for any g. 

The lattice replacement of the massive continuum Dirac Operator, D{m), 
is an element in the algebra generated by T^, T^, 7^. Thus, D{m) is gauge 
covariant. For Ufi{x) = 1 the T^ become commuting shift operators. One 
can preserve several crucial properties of the continuum Dirac Operator by 
choosing a D{m) which satisfies: 

1. Hypercubic symmetry. This discrete symmetry group consists of 24 per- 
mutations scrambling the fj, indices (which leave D{m) unchanged) com- 
bined with 16 reflections made out of the following four elementary op- 
erations: 

l5lMT^. ^^ Tl){m)j^j5 - D{m) . (9) 

2. Correct low momentum dependence. When the commutators [T^, T^] = 
one can simultaneously diagonalize the T^ unitaries, with eigenvalues e* f . 
For small 9 angles, one gets D{m) ^ m + jX^uT/j^m + 0{9^). One also 
needs to require that for zero mass (to = 0) D' D be bounded away from 
zero for all 9 which are outside a neighborhood R of zero. 

3. Locality and smoothness in the gauge field variables. One can also require, 
at least for gauge backgrounds with small enough (in norm) commutators 
[T^,T[^], that D be a convergent series in the T^. 

The simplest solution to the above requirements is the Wilson Dirac Op- 
erator, Dw- It is the sparsest possible realization of the above. Fixing a par- 
ticular free parameter (usually denoted by r) to its preferred value (r = 1) 
Dw can be written as: 

Dw^m + 4-J2v,; V^V, ^ 1; V, ^ ^^T, + ^^T^ . (10) 



For TO > clearly ||-Dw|| > m, but to get the physical quark mass to zero, 
y^phys _ Q^ Qj^j, needs to take into account an additive renormalization in- 
duced by the fluctuations in the gauge background U^{x): m must be chosen 
as TO = mc{fi) < where /3 is the lattice version of the gauge coupling con- 
stant governing the fluctuations of the background. The need to use a nega- 
tive TO, on the lattice opens the door for very small eigenvalues for D^^Dy^r 
and thus for terrible condition numbers. To get a small but nonzero physical 
quark mass one should choose m ~ mc{(3) + Am, Am > but the numbers 
are such that one always ends up with an overall negative m. 



3.3 A simplified "no-go theorem". 

These problems would go away if we had chirality, because it would single out 
the TO = values as special and remove the additive renormalization. But, this 
cannot be done. There are rather deep reasons for why this cannot be done, 
and several versions of "no-go theorems" exist |g|. Here, I shall present only a 
very simple version, namely, one cannot supplement the above requirements 
of D{m) (fulfilled by Dw) with the requirement 751)75 — —D at to = 0. The 
proof is very simple: Take the particular case where the T^ commute. Com- 
pounding four elementary reflections we get ^^D{0)^5 = D{—9) = —D{9). 
This shows that any one of the sixteen solutions 6*;^ = 0, tt has D{9*) — 0, 
since, by periodicity tt — — tt. This violation of one of the original require- 
ments amounts to a drastic multiplication of the number of Dirac particles: 
instead of one we end up with sixteen ! 



4 Overlap Dirac Operator 

Any regularization deemphasizes the dynamics of short length/time scales. 
Lattice regularizations completely remove arbitrarily high momenta by com- 
pactifying momentum space. Thus, the spectrum of a lattice regularized Dirac 
operator lives on a compact space. When looking for a way around the chi- 
rality "no-go theorem" a possible line of attack is to prepare ahead of time 
for the regularization step by shifting the focus to an operator which already 
has a compact spectrum in the continuum. 

Based on the anti-hermiticity of Dc (which we argued before can be viewed 
as a fundamental property, one that in conjunction with the other fundamen- 
tal property of 75-hermiticity produces chirality as a consequence) we can 
quite naturally define an associated unitary operator Vc using the Cayley 
transform: 

Vc is not only unitary, but also 75-hermitian, inheriting the property from 
Dc- In other words, Vc has the two crucial properties of Dc and these two 
properties would imply chirality for Dc defined in terms of Vc by inverting 



the Cayley transform (the inverse is another Cayley transform). Also, one has 
to note that the "no-go theorem" does not prohibit a lattice version of Vc, 
V, that satisfies both requirements of 75-hermiticity and unitarity. However, 
the D one would construct from such a V would need to violate something - 
the natural choice is that D be non-local. Still, nothing seems to say that V 
itself has to be nonlocal: 

The non-locality in D, mandated by the "no-go theorem" , could reflect merely 
the existence of unit eigenvalues to V. 

The next question is then, suppose we have a lattice V] since D is non- 
local, how can we hope to make progress ? The answer is almost trivial if we 
go back to continuum: we only care about the spectral part of Dc which is 
below some cutoff Ac- So, we are allowed to expand the Cayley transform: 



Vr. = -1 



'Ar. 



o 



Therefore, 



Ar 



l + Vc 



plus unphysical corrections. 



(13) 



(14) 



If the lattice-^ is local there is no locality problem with the lattice-Do being 
given by 

Do = ^Af ■ (15) 

Now, in agreement with the "no-go theorem" , we have lost exact anti-hermiticity, 
and, consequently exact, mathematical, chirality. However, the loss of chiral- 
ity can be made inconsequential, unlike in the traditional Wilson formulation 
of fermions. As a first sign of this let us check that adding a mass term pro- 
duces a lower bound similar to the continuum and thus would protect our 
precious condition number. It is easy to prove that: 



mc 

X 



1 + K 



mc_ 
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1 + vc 



> min 



X 



(16) 



So, we have an expression bounded away from zero as long as ^ 7^ 0,-1. 
Another way to see that we have as much chirality as we need is to calculate 
the anticommutator of Do with 75, which would be zero in the continuum, 
when Dc is used. Actually, in the language of path integrals, we do not really 
need the Dirac Operator itself: rather we need its inverse and its determinant. 
So, it is closer to physics to look at the inverse of the Overlap Dirac Operator 
Do- D-^ obeys 



757^-1 +i?-S = 275 



(17) 



The same equation would also hold in the continuum if we replaced D^ ^ by 
-rru'- There is little doubt that such a replacement in the continuum has no 



physical consequence; after all it only changes D^^ additively, by —1, a Dirac 
delta-function in space-time. While on the lattice we don't have a local and 
exactly chiral D, we do have a Do — ^-^ and we now see that its inverse 
is good enough. In a brilliant paper [HM, published almost twenty years ago, 
Ginsparg and Wilson suggested this relaxation of the chirality condition as 
a way around the "no-go theorem" . What stopped progress was the failure 
of these authors to also produce an explicit formula for Do- Their failure, 
apparently, was taken so seriously, that nobody tried to find an explicit Do, 
and the entire idea fell into oblivion for an embarrassingly long time. Nobody 
even made the relatively straightforward observation that the search for Do 
was algebraically equivalent to a search for a unitary, 75-hermitian operator 
V . Moreover, it seems to have gone under-appreciated that the problem was 
not so much in satisfying the algebraic constraint of the relaxed form of the 
anticommutator (the GW relation), but, in simultaneously maintaining gauge 
covariance, discrete lattice symmetries, and correct low momentum behavior, 
without extra particles. 



4.1 Definition and basic properties 

To guess a lattice formula for V , first notice that, in the continuum, the 
operator ec = 75^4 is a reflection: e^ = 1 and ej = ec- Also, K, and hence Ec, 
are expressed as a ratio of massive Dirac operators. But, we know that there 
is no difficulty in representing on the lattice the massive Dirac Operator. The 
simplest way to do this is to use the Wilson Dirac Operator, Dw- The latter 
obeys 75-hermiticity, so Hw = "JsDw is hermitian. It is then very natural to 
try 

e^signiHw) ■ (18) 

This is not the entire story - we still have one parameter at our disposal, 
the lattice mass, m. When we look at the continuum definition of Vc we 
observe that it can be rewritten as follows: 



' V-D^ + Al D, + A, ■ "■ ^ 

So Vc is given by the product of two unitary operators, one being the unitary 
factor of the Dirac Operator with a large negative mass and the other the 
conjugate of the unitary factor of the Dirac Operator with a large positive 
mass. Formally, the unitary factors are equal to each other up to sign in the 



® Clearly, when going to the lattice, the continuum delta-function can be replaced 
by a Kroneker delta-function or by, say, a narrow Gaussian. This gives a cer- 
tain amount of freedom which I shall ignore below. In the lattice formulation of 
Ginsparg and Wilson this freedom is made explicit by the introduction of a local 
operator R. 



continuum and one has: 

Vc = -U! = -(-^^^\ ; -D'^+Al = iD^-A^)\D^-A^) . (20) 

But, on the lattice, a local unitary operator U replacing Uc cannot exist; if 
it did, D — ^(C/t — U) would violate the "no-go theorem". So, on the lattice 
we can find a V, but there is no local square root of —V: this observation 
is rather deep, and related to anomalies. We need to treat the two unitary 
factors in the continuum expression for Vc differently on the lattice. Actually, 
the factor with a positive mass sign can be replaced by unity by taking 
the limit m -^ oo, something one can do only on the lattice, where the 
T^ operators are bounded. The factor representing negative mass however 
cannot be so simplified because the negative mass argument m is restricted 
to a finite interval (—2 < jti < 0) to avoid extra particles. Thus, on the lattice 
we are led to 

V = ri^yirn) with -2<m<0. (21) 



) , — 

y D\y {171)0 w{m) 



This formula is equivalent to the one for e above. The reason that we get 
exactly massless quarks is that no fine tuning is needed for V to have eigen- 
values very close to ~1. Indeed, Dwiiri) = Dw{0) + rn and, in the case 
[Tfj,,T,y] = 0, Dw{0) is easily seen to have very small eigenvalues. There 
Dyy{m) is dominated by the negative mass m. It is unimportant what the 
exact value of m is, only its sign matters. Even when [Tf^,Ti,] ^ 0, and m is 
additively renormalized, as long as the effective m stays negative, we shall 
have exactly massless quarks as evidenced by the eigenvalues of V close to 
— 1 potentially producing long range correlations in D^^. 

All non-real eigenvalues of V are paired: Vip — e'^'"ip => Vj^ip = e~*"75V'- 
But, one important property of the continuum Vc is that it has an unpaired 
single —1 eigenvalue in a certain class of topologically nontrivial backgrounds. 
This is a characteristic of massless quarks and has to be reproduced on the 
lattice. There V is even dimensional, so a single exact ^ = — 1 state implies 
the existence of another V ~ +1 state. In addition, it is clear that as a result 
of |to| < 2, there are states where Dw{0) dominates over m and these states 
will have eigenvalues far off the real axis. We conclude that the spectrum 
of V can cover the entire complex unit circle. The spectrum of Uc though, 
covers only half the complex unit circle. This is a reflection of V being a 
lattice version of —11^ rather than Uc itself, as one might think naively. 

It was mentioned already that all one needs in the path integral are for- 
mulae for the inverse of the Dirac Operator and for its determinant. It is clear 
that once one has an acceptable V one can form not only the local, but not 
strictly chiral operator Do but also a non-local chiral associate D^ ~ j^ ■ I^- 
the determinant one must use Do and cannot use D^. Otherwise, the extra 



zeros of the determinant coming from the V = 1 states leave an indelible 
effect in the continuum limit. These zeros directly reflect the non- locality of 
D^. However, the inverses of Do can be replaced by inverses of D^ which has 
useful practical implications. In the language of Feynman graphs, the deter- 
minant is represented by closed internal fermion loops, and on those we must 
use D~^ , but the inverses come from external fermions lines, and on those 
we can just as well use D^ "^ = D~^ — 1. This is consistent because the —1 
term can be interpreted as coming from an auxiliary fermionic variable which 
contributes only a unit multiplicative factor to the fermionic determinant. 

To complete the logic of the story let me discuss the introduction of a finite 
quark mass parameter in the context of the Overlap Dirac Operator. To be 
sure there is no additive mass renormalization, one wishes to preserve the 
continuum property that Tr ^^ ,^ is odd in rric', this propertyj] singles out 
the point rric = 0. This property is a direct consequence of 75£'c + £'c75 = 0. 
It is easy to see that for an external fermion line the lattice version would be 

Tr—-^=Tr--—-l); p > 0; (22) 



eP + V Vl + e^'^V' 

with 



m, = ztanh^; (^) ^-E^A + 0(^^)- (23) 



The factor z is somewhere between 1 and 2. One now easily proves that the 
condition number of the matrix ^-^t-tf is i — ttt^ — -^ ■ Therefore, we should 

e.P — V taiih(^) rUq ' 

be able to get to the quark masses we need. However, the matrix e isn't 
sparse, and the evaluation of its action will be time consuming. Still, it is a 
function of a sparse matrix Hw ^ so employing the Overlap Dirac Operator 
in numerical QCD is not ruled out ah initio. 



4.2 Implementation by rational approximants 

The operator e is unambiguously defined only for matrices i/^y that have no 
zero eigenvalues. This is not a restriction in itself, because the variables Ufj_{x) 
in Hw are stochastic and there is no symmetry protecting zero eigenvalues of 
Hyy- So, the probability to encounter an exact zero of Hw is zero. Moreover, 
in the continuum limit Hw will have a relatively large gap around zero, so 
when we are close enough to the continuum we need the sign function only 
over the range of arguments [—a, —b] U [&, a], where a ~ 8 and b should be 
something like 0.1 or 0.5. 

Over the above range it is relatively easy to approximate the sign function 
by simpler functions. Also, for b uniformly (in the background) bounded away 

"^ A less careful introduction of non-zero quark mass in the Overlap Dirac Operator 
requires extra unnecessary subtractions - see Ig . 



from zero, the operator — . "*" is a convergent series in Hw- This ensures 

that the non-sparse Do is nevertheless sufficiently local to be an acceptable 
approximation to a continuum differential operator. 

Mimicking the way transcendental functions are typically implemented 
in computers, we are led to try a rational approximation M]. So, we are 
looking for a series of functions e„(x) which, for any fixed x ^ 0, obeys 
lim„^oo £n (a;) — sign(x). For each finite n, Sn{x) is a ratio of two polyno- 
mials. A sequence with many good properties has been in use by applied 
mathematicians B who were interested in the sign function because of the 
role it plays in control theory. The sequence is given by: 

n I ^^>)2n _ Q _ 2;)2« f l^^l < 1 tanh[2n tanh"\x)] 
=«(^) = (i + ;,)2n + (i_,^)2„ = <^ kl > 1 tanh[2ntanh-i(x-i)] (24) 

So long I a; I is sufficiently far from zero a large enough n can provide an approx- 
imation to the sign function good to machine accuracy. The approximants 
e„(j;) are smooth and maintain some properties of the sign function: 

£n(a;) = -e„(-x) =£„(-); |e„(a;)| < 1; £n(±l) = ±1 • (25) 

X 

To implement en{x) we decompose /(x^) = en{x)/x in simple pole terms: 

'"^'^ ^ « S .^cos^^(.-i)+sin^-(.-i) ■ ^''^ 

One can choose an appropriate scaling parameter A > and approximate 
the sign function of Hw by £„(Aifw)- The action of this approximated e on 
a vector ip can be evaluated by a multi-shift Conjugate Gradient inversion 
algorithm g . The multi-shift trick reduces the computational cost of evalu- 
ating the action of each one of the terms in the pole expansion to the cost of 
evaluating one single inversion, the most time consuming one, which clearly 
is 

A=H?. + .-'*;'" ■ <"' 

Memory requirements are linear in n, since one needs to store a few vectors 
for each pole term. In practice this may be a problem, not as much that the 
entire memory of the machine would be exceeded, but rather that one would 
find oneself exceeding the level 2 cache. Level 2 cache misses can lead to 
substantial performance loss. Thus, it is useful to consider an alternative to 
the standard implementation of the multi-shift trick. Usually, in applications 
of the multi-shift trick, one is really interested in the individual vectors, 
but here we only want their weighted sum. It is quite easy to figure out a 
way to store only a few vectors, and get the sum, but a double pass over the 
basic Conjugate Gradient procedure is now required. Although the number of 



floating point operations is doubled, because of reduced cache miss penalties, 
performance is not necessarily adversely impacted and can actually increase 

[|- 

Other promising methods to implement Do have been developed and will 

be hopefully discussed here later |lQ| . In the implementation employed in M 

the sign function was approximated by a very high order polynomial. 

4.3 A new bottleneck and projectors 

Having removed one apparent obstacle, namely the potentially large memory 
requirements, we turn to a much more substantial obstacle, namely the condi- 
tion number oi H^, k. k determines both the number of Conjugate Gradient 
iterations needed to evaluate the inverse and also the minimal required n for 
given expected accuracy S, where 

WsniXHw) ~ signiHw)\\ < S . (28) 

The optimal choice of A is easily found because of the inversion symmetry of 
e„(x): A/imin = -j-r" — where /imin, max are the square roots of the minimal 
and maximal eigenvalues of ifw. Thus, A = ,, "'", so the range over 
which the sign function is needed is 

^ < |a;| < K^ . (29) 

For en{x) to be an acceptable approximation we need then n w -jt' | log('5/2)|. 
For example, for k ~ 10'* and single precision n ~ 50 is safe; for double pre- 
cision, the needed n doubles. On the other hand the number of Conjugate 
Gradient iterations will go as k^ . Note that the lowest eigenvalue of A^if^ 

2 1 

and the minimal pole shift tS-^ are of the same order when n ^ k^ . 

Comparing to traditional simulations, where k is again the source of all 
problems, it is important to note the new feature that now the parameter m 
in H\Y is taken in a different regime from before. In traditional simulations 
the parameter m is adjusted so that the physical quark mass be small. This 
precisely means that the parameter is chosen so that k be large. Because 
of fluctuations, n becomes often much larger than one would have expected, 
and that is the problem faced by traditional simulations discussed earlier. 
Here the parameter m, subject to the limitation —2 < tti < 0, can be chosen 
so that K be as small as possible. Fluctuations can still cause problems and 
sometimes make n large, but, in principle, the fluctuations are around a 
small K value, not a large one. In practice however the situation is somewhat 
marginal: coarse lattices produce too much fluctuations even for the Overlap 
Dirac Operator. But, flner lattices are manageable. Still, even on finer lattices 
one needs to split the domain over which the sign function is approximated 
into a neighborhood around zero and the rest. The neighborhood around zero 



contains of the order of ten eigenvalues, and by computing the exact projector 
on the corresponding eigenspace, the sign function is exactly evaluated there, 
leaving only the more manageable part of the spectrum to be dealt with by the 
rational approximation p^ . This is quite time consuming, and constitutes the 
new bottleneck we are facing. Variations of the probability distribution of the 
background and slight modifications of Hiy would not affect the continuum 
limit but would probably help ameliorate this problem. 



4.4 Avoiding nested Conjugate Gradient procedures 



Another obvious drawback of using the Overlap Dirac Operator in numerical 
simulations is that we eventually need to compute vectors of the form ttt/V' 
for a given ijj. This requires another inversion and we end up with a two level 
nested Conjugate Gradient procedure, whereas in the traditional simulations 
we only had one. The outer Conjugate Gradient is going to be better condi- 
tioned (for light but still massive quarks) then in the traditional approach. 
So, we are facing a tradeoff issue, one that has not been resolved yet. The 
two levels of Conjugate Gradients are presently dealt with separately, but 
clearly, eventually, potential gains will be obtained from the fact that the 
inner procedure does not have to be run to high accuracy for the initial steps 
in the outer procedure. 

There exists another trick ||l^ to simulate the rational e„(AiJvK) which 
does away with the need of employing a nested conjugate gradient altogether, 
but at the expense of memory usage linear in n. Basically, the point is to find a 
Conjugate Gradient algorithm operating in an enlarged space and producing 
the vector jTv"ip in one blow. This trick is based on two observations: 



1 . Any rational approximation can be written as a truncated continued frac- 
tion. 

2. Any truncated continued fraction can be represented by a Gaussian Path 
Integral of a fermionic system living on a chain of length n, where n is 
the depth of the truncated continued fraction. 

This method is applicable to any rational approximation. Below is one 
way to apply it to SniHw), where the scale factor is absorbed in Hw for 
notational simplicity. 

First, the rational approximation has to be written in the form of a con- 
tinued fraction with entries preferably linear in Hw- I start from a formula 
that goes as far back as Euler, and subsequently use the invariance under 
inversion of x to move the x factors around, so that the entries become linear 



m X. 



£n{x) 



2nx 
(4n^ - l)x'^ 



(4n^ - 4)a;-' 



(30) 
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Now, with the help of extra fields, I write a Gaussian path integral which 
induces the desired action between a chosen subset of fields: 



d(t>id4>id(t>2d(t>2 ■ ■ ■ d(t>nd(t>ne^' = (deti7i^)2"e^'^(^=+''"(-f^"'))'^ . (31) 



The quadratic action 5* couples the extended fermionic fields x? X- 



X = ( -0 </>! 



»2n j , X 



(t ^ 



\ 4)211 ) 



(32) 



Si, = xHx, where the new kernel, H, has the following block structure: 
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(-1^ 
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■ Hw 









\ 


^a2n- 
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(33) 



^/a2n-l —Hw / 



The numerical coefficients a are given below: 



(2n-j)(2n + j) . 

an = 2ri, a, = ^ -, 7 = 1,2, ... 

■' (2j-l)(2j + l)' ■' 



(34) 



The hope is that the condition number of H will be manageable. The basic 
point is that up to a scalar factor, the (1,1) diagonal block of the inverse 
H""'^ is equal to -rryl5- H is sparse and the evaluation of H^^x requires 
one single Conjugate Gradient procedure, albeit one acting on vectors 2n 
times longer than needed. Although I used Path Integrals to get the relation 
between H and the Overlap Dirac Operator, there is nothing more to the 
derivation than ordinary Linear Algebra, and Path Integrals could have been 
bypassed altogether. 

So, at the expense of adding extra fields one can avoid a nested Conjugate 
Gradient procedure. This would be particularly important when dynamical 



fermions are simulated. The chain version of the direct truncation of the Over- 
lap Dirac Operator is similar in appearance to another truncation, usually 
referred to as "domain walls" p3{ . 

The proposal above, employing chains, has two potential advantages over 
domain walls. First, it is much more flexible, allowing one to change both 
the rational approximation one uses and its chain implementation. This flex- 
ibility ought to allow various improvements. Second, since here the argument 
of the approximated sign function is XHw , not the rather cumbersome loga- 
rithm of the transfer matrix of the domain wall case, eigenstates of Hw with 
small eigenvalues can be eliminated by projection with greater ease. This 
elimination, although costly numerically, vastly increases the accuracy of the 
approximation to the sign function. Actually, at this stage of the game and at 
practical gauge coupling values, the use of projectors seems to be numerically 
indispensable to direct implementations of the Overlap Dirac Operator. Al- 
though no projectors have been implemented in domain wall simulations and 
physical results have been claimed, to me it seems doubtful that the domain 
wall version of truncating the Overlap will really be capable to get to small 
quark masses without employing some technique equivalent to projection. 
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